Neural network pruning has been a well-established compression technique to enable deep learning models on resource-constrained devices. The pruned model is usually specialized to meet specific hardware platforms and training tasks (defined as deployment scenarios). However, existing pruning approaches rely heavily on training data to trade off model size, efficiency, and accuracy, which becomes ineffective for federated learning (FL) over distributed and confidential datasets. Moreover, the memory- and compute-intensive pruning process of most existing approaches cannot be handled by most FL devices with resource limitations. In this paper, we develop FedTiny, a novel distributed pruning framework for FL, to obtain specialized tiny models for memory- and computing-constrained participating devices with confidential local data. To alleviate biased pruning due to unseen heterogeneous data over devices, FedTiny introduces an adaptive batch normalization (BN) selection module to adaptively obtain an initially pruned model to fit deployment scenarios. Besides, to further improve the initial pruning, FedTiny develops a lightweight progressive pruning module for local finer pruning under tight memory and computational budgets, where the pruning policy for each layer is gradually determined rather than evaluating the overall deep model structure. Extensive experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art baseline approaches, especially when compressing deep models to extremely sparse tiny models.
translated by 谷歌翻译
量子计算机是下一代设备,有望执行超出古典计算机范围的计算。实现这一目标的主要方法是通过量子机学习,尤其是量子生成学习。由于量子力学的固有概率性质,因此可以合理地假设量子生成学习模型(QGLM)可能会超过其经典对应物。因此,QGLM正在从量子物理和计算机科学社区中受到越来越多的关注,在这些QGLM中,可以在近期量子机上有效实施各种QGLM,并提出了潜在的计算优势。在本文中,我们从机器学习的角度回顾了QGLM的当前进度。特别是,我们解释了这些QGLM,涵盖了量子电路出生的机器,量子生成的对抗网络,量子玻尔兹曼机器和量子自动编码器,作为经典生成学习模型的量子扩展。在这种情况下,我们探讨了它们的内在关系及其根本差异。我们进一步总结了QGLM在常规机器学习任务和量子物理学中的潜在应用。最后,我们讨论了QGLM的挑战和进一步研究指示。
translated by 谷歌翻译
Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively.
translated by 谷歌翻译
The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks.
translated by 谷歌翻译
Text-guided diffusion models have shown superior performance in image/video generation and editing. While few explorations have been performed in 3D scenarios. In this paper, we discuss three fundamental and interesting problems on this topic. First, we equip text-guided diffusion models to achieve $\textbf{3D-consistent generation}$. Specifically, we integrate a NeRF-like neural field to generate low-resolution coarse results for a given camera view. Such results can provide 3D priors as condition information for the following diffusion process. During denoising diffusion, we further enhance the 3D consistency by modeling cross-view correspondences with a novel two-stream (corresponding to two different views) asynchronous diffusion process. Second, we study $\textbf{3D local editing}$ and propose a two-step solution that can generate 360$^{\circ}$ manipulated results by editing an object from a single view. Step 1, we propose to perform 2D local editing by blending the predicted noises. Step 2, we conduct a noise-to-text inversion process that maps 2D blended noises into the view-independent text embedding space. Once the corresponding text embedding is obtained, 360$^{\circ}$ images can be generated. Last but not least, we extend our model to perform \textbf{one-shot novel view synthesis} by fine-tuning on a single image, firstly showing the potential of leveraging text guidance for novel view synthesis. Extensive experiments and various applications show the prowess of our 3DDesigner. The project page is available at https://3ddesigner-diffusion.github.io/.
translated by 谷歌翻译
The goal of 3D pose transfer is to transfer the pose from the source mesh to the target mesh while preserving the identity information (e.g., face, body shape) of the target mesh. Deep learning-based methods improved the efficiency and performance of 3D pose transfer. However, most of them are trained under the supervision of the ground truth, whose availability is limited in real-world scenarios. In this work, we present X-DualNet, a simple yet effective approach that enables unsupervised 3D pose transfer. In X-DualNet, we introduce a generator $G$ which contains correspondence learning and pose transfer modules to achieve 3D pose transfer. We learn the shape correspondence by solving an optimal transport problem without any key point annotations and generate high-quality meshes with our elastic instance normalization (ElaIN) in the pose transfer module. With $G$ as the basic component, we propose a cross consistency learning scheme and a dual reconstruction objective to learn the pose transfer without supervision. Besides that, we also adopt an as-rigid-as-possible deformer in the training process to fine-tune the body shape of the generated results. Extensive experiments on human and animal data demonstrate that our framework can successfully achieve comparable performance as the state-of-the-art supervised approaches.
translated by 谷歌翻译
The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by concentrating the focus on each individual client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images, thereby calibrating the focus and the recognition result. Further considering that ICIIA's overhead is dominated by linear projection, we propose partitioned linear projection with feature shuffling for replacement and allow increasing the number of partitions to dramatically improve efficiency without scarifying too much accuracy. We finally evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets. Extensive evaluation results demonstrate the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively.
translated by 谷歌翻译
几乎没有零件分割的目的是仅给出几个带注释的样本,将对象的不同部分分开。由于数据有限的挑战,现有的作品主要集中在学习分类器上,而不是预先训练的功能,无法学习针对零件细分的任务特定功能。在本文中,我们建议在“预训练” - “微调”范式中学习特定于任务的功能。我们进行及时设计以减少预训练任务(即图像生成)与下游任务(即部分分段)之间的差距,以便可以利用生成的GAN先验进行分割。这是通过将零件分割图投影到RGB空间中并在RGB分割图和原始图像之间进行插值来实现的。具体而言,我们设计了一种微调策略,以逐步将图像发生器调整到分割生成器中,在该机构中,生成器的监督通过插值从图像到分割图各不等。此外,我们提出了一个两流体系结构,即一个分割流以生成特定于任务的特征,以及一个图像流以提供空间约束。图像流可以视为自我监管的自动编码器,这使我们的模型能够从大规模的支持图像中受益。总体而言,这项工作是试图通过及时设计来探索一代任务和感知任务之间的内部相关性。广泛的实验表明,我们的模型可以在几个部分分割数据集上实现最新性能。
translated by 谷歌翻译
旨在学习具有少量培训数据的生成模型的数据有效gan(DE-GAN)遇到了生成高质量样本的几个挑战。由于数据增强策略在很大程度上已经减轻了训练的不稳定性,因此如何进一步改善De-Gans的生成性能成为热点。最近,对比学习表明,提高了DE-GAN的合成质量的巨大潜力,但相关原则并未得到很好的探索。在本文中,我们对De-Gans中的不同对比度学习策略进行了比较,并确定(i)当前生成性能的瓶颈是潜在空间的不连续性; (ii)与其他对比的学习策略相比,实例扰动可用于潜在空间连续性,从而为De-Gans带来了重大改进。基于这些观察结果,我们提出了FakeClR,该观察只在扰动的假样品上应用对比度学习,并设计了三种相关的训练技术:与噪声​​相关的潜在增强,多样性吸引的排队和排队的遗忘因素。我们的实验结果表明了几乎没有发电和有限数据的新艺术状态。在多个数据集上,与现有DE-GAN相比,Fakeclr获得了15%以上的FID提高。代码可从https://github.com/iceli1007/fakeclr获得。
translated by 谷歌翻译
图像段落字幕旨在描述具有一系列连贯句子的给定图像。大多数现有方法通过主题过渡对一致性建模,该主题过渡将主题向量从先前的句子中移动。但是,这些方法仍然遭受生成段落的立即或延迟重复,因为(i)语法和语义的纠缠使主题向量分散了参与相关视觉区域的注意力; (ii)学习长期过渡几乎没有限制或奖励。在本文中,我们提出了一个旁路网络,该网络分别模拟了前面句子的语义和语言语法。具体而言,提出的模型由两个主要模块组成,即主题过渡模块和句子生成模块。前者将先前的语义向量作为查询,并将注意机制应用于区域特征以获取下一个主题矢量,从而通过消除语言学来减少立即重复。后者将主题向量和先前的语法状态解码以产生以下句子。为了进一步减少生成段落中的延迟重复,我们为加强培训设计了基于替代的奖励。广泛使用的基准测试的全面实验证明了所提出的模型优于最终的技术,同时保持了高精度。
translated by 谷歌翻译